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Abstract 

The pervasiveness and availability of mobile phone data offer the opportunity of dis¬ 
covering usable knowledge about crowd behaviors in urban environments. Cities can 
leverage such knowledge in order to provide better services (e.g., public transport 
planning, optimized resource allocation) and safer cities. Call Detail Record (CDR) 
data represents a practical data source to detect and monitor unusual events con¬ 
sidering the high level of mobile phone penetration, compared with CPS equipped 
and open devices. In this paper, we provide a methodology that is able to detect 
unusual events from CDR data that typically has low accuracy in terms of space and 
time resolution. Moreover, we introduce a concept of unusual event that involves a 
large amount of people who expose an unusual mobility behavior. Our careful con¬ 
sideration of the issues that come from coarse-grained CDR data ultimately leads 
to a completely general framework that can detect unusual crowd events from CDR 
data effectively and efficiently. Through extensive experiments on real-world CDR 
data for a large city in Africa, we demonstrate that our method can detect unusual 
events with 16% higher recall and over 10 times higher precision, compared to state- 
of-the-art methods. We implement a visual analytics prototype system to help end 
users analyze detected unusual crowd events to best suit different application scenar¬ 
ios. To the best of our knowledge, this is the first work on the detection of unusual 
events from CDR data with considerations of its temporal and spatial sparseness and 
distinction between user unusual activities and daily routines. 


1 Introduction 

The ubiquity of mobile devices offers an unprecedented opportunity to analyze the trajectories of 
movement objects in an urban environment, which can have a significant effect on city planning, 
crowd management, and emergency response [1]. The big data generated from mobile devices, 
thus, provides a new powerful social microscope, which may help us to understand human 
mobility and discover the hidden principles that characterize the trajectories defining human 
movement patterns. Cities can leverage the results of the analytics to better provide and plan 
services for citizens as well as to improve their safety. For example, during the occurrence of 
expected or chaotic events such as riots, parades, big sport events, concerts, the city should be 
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able to provide a proactive response in allocating the correct amount of resources, adapt public 
transport services, and more generally adopt all possible actions to safely handle such events. 
Many methods have been proposed in the literature to detect groups of people moving together 
from a trajectory database [Hll 1611181132] . specifically the GPS data. However, only a very little 
percentage of people cnrrently carry GPS devices, and share their movement trajectories with 
a central entity that can use them to identify crowd events. 

In this paper, we study the problem of unusual event detection from mobile phone data that is 
opportunistically collected by telecommunication operators, in particular the Call Detail Records 
(CDR). In 2013, the nnmber of mobile-phone snbscriptions reached 6.8 billion, corresponding 
to a global penetration of 96%. The pervasiveness of mobile phones is spreading fast, with 
the number of subscriptions reaching 7.3 billion by 2014, from a recent report by International 
Telecommnnications Union (ITU) at 2013 Mobile World Congress [7]. Therefore, CDR data 
represents a practical data source to detect and monitor unnsual events considering the high 
level of mobile phone penetration. This is specifically nseful in developing countries where other 
methodologies to gather crowd movement data (e.g., GPS or cameras) are very expensive to be 
installed. 

The task of detecting unusnal events from CDR data is very different from previons work on 
fine-grained trajectory data, such as GPS data, and presents several nnique challenges. Tempo¬ 
ral sparseness: CDR data only records the user location when a call or text message is made 
or received, thus is temporally sparse since call or message freqnency of users is usually low and 
unpredictable. Spatial sparseness: The location information of nsers when they make a call 
or message is recorded as the location of the antenna, which brings the spatial sparseness of 
CDR data. Non-routine events: Our objective is to detect unusual crowd events from hnman 
daily movements, which mostly consist of usual routines. Thus, it is necessary to discriminate 
unnsual crowd movements from routine trajectories. 

To address these challenges, we aim to estimate the location of nsers in absence of spatio- 
temporal observations (i.e., the users don’t make phone calls), detect groups of people moving 
together, and proactively discover unusual events. We propose a general framework to infer 
unnsual crowd events from mobile phone data. Specifically, onr contributions can be summarized 
as follows: 

• We first define the cylindrical cluster to capture sparse spatio-temporal location data and 
provide practical methods to extract crowd events from CDR data, and further formalize 
the unusual crowd event detection problem by considering the similarity between individ¬ 
uals’ trajectories and their historical mobility profiles. 

• We provide a Visual Analytics Prototype System to help the end user (e.g. a city manger 
or analyst) analyze the detected crowd events and set the values of the parameters to best 
suit an application scenario. 

• Finally, we evalnate onr proposed framework on a real-world CDR dataset and demonstrate 
its effectiveness and efficiency. Onr method significantly ontperforms (10 x precision and 
-|-16% recall) previons event detection methods on GPS data with verification on real-world 
unusnal crowd events. 
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Figure 1. Process flow of the system to detect unusual events. 

The mobile phone CDR data used In this work is collected from Cote d’Ivoire over five months, 
from December 2011 to April 2012. Dnring that period, this Africa country faced the Second 
Ivorian Civil War and political crisi^. Prom the election of new president and parliament, 
continued outbreaks of post-election conflicts happened, including boycott, violence and protest 
etc. The experimental results on this real-world dataset deliver the effectiveness of our proposed 
methodologies, which demonstrates the significant importance of our work in the supervision of 
unusual crowds and events for city and country management. Moreover, through the proposed 
method, city mangers and officials can gain insights into non-ticketed events taking place in 
public spaces, which could lead to estimating the number of attendees and to estimating the 
event’s success. A particular instance of such method has been recently implemented to help 
the police and the event organizers monitor visitors to the Mons 2015 - European Capital of 
Culture Opening Ceremony [3]. 

2 Unusual Event Detection Problem 

Given the nature of CDR data, we face three major challenges in extracting accurate individual 
trajectories. First, we can only record user locations when they make calls or receive calls 
(text messages). As most mobile users do not make phone calls frequently and periodically, 
positions are not regularly sampled, as opposed to GPS navigation systems. Moreover, mobile 
users do not follow a call pattern consistently with others in the gronp. Second, when a user 
makes a call, CDR data only records the base station she is using, providing very low quality 
location information. Finally, the scenario that we are considering—e.g., going to a protest—is 
not consistent with an individual’s daily activity pattern such as going from home to office, thus 
we cannot leverage the previous history of the user to enrich his trajectory to make it more 
accurate. 

We formally define the problem of unusual event detection and decompose the problem in 
different steps that enable us to solve the challenges brought by CDR data. Figured] shows the 
process flow to detect unusual events. The system receives CDR data as input, extracts clusters, 
and detected crowds from the sequences of clusters. Then, the system verifies some constraints 
for each crowd and it labels them as unusual if necessary. Subsequently, one or more nnusnal 
crowds compose unusual events. 

Let DBcdr = {calli,call 2 , - ■ ■ ,calln} denote the set of all calls collected from a mobile 
phone network. We define a call as a tnple calli =< ti,Vj,lk >, which means a nser Vj makes 
or receives a call at location 4 at timestamp ti, where Vj €V,ti€ T,lk € L. R is the set of all 
nsers and T denotes all possible timestamps. Specifically, 4 stands for the geographical location 


'http://en.Wikipedia.org/wiki/2010-ll_Ivorian_crisis 
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of the mobile network antenna and L means the set of locations of all antennas found in 
DBcdr- We dehne the individual mobility trajectory [iniisz] for each user as follows. 

Definition 1. Individual Trajectory: A user Uj’s mobility trajectory from start time tp to end 
time tq is defined as a sequence of spatio-temporal tuples where tp < ti < tq and 

€ S. S stands for the set of user trajectory sequences. 

Cylindrical Cluster. The hrst step to identify crowd events from individual trajectories is 
to find, at any specific timestamp, clusters of individuals that are very close in space. However, 
since CDR data is very sparse on the time scale (i.e., users do not make calls regularly and 
synchronized with each others), we propose the concept of cylindrical cluster in coarse-grained 
spatio-temporal data. Finer grain clustering, such as density-based clustering [8] cannot be 
applied as the antenna is the lowest level of spatial resolution available in the data. Indeed, 
users are already clustered by association to the antenna they use at each call (which dehnes a 
specihc coverage area in the city, ranging from a few hundred squared meters to a few kilometers). 

Definition 2. Cylindrical Cluster: Given a CDR database DBcdr which contains individual 
calls with time and antenna information, and a scale threshold e„, the cylindrical cluster CCt 
at timestamp t is a non-empty subset of users Vt 'TV satisfying the following conditions: 


• Connectivity. Vuj € Vt, Vi makes at least one call by using antenna Ox, in the interval [t - 

t + et]. 


Scale. The number of users \Vt\ in CCt is no less than Cr, 


Figure 2(a) shows an illustrative example for cylindrical clusters. Given a timestamp ti, we 
can see that userl, user2, userS and user A make calls during time interval [ti — et, H -|- et]. Also, 
user?), userl and user2 use the same antenna which is different from user userA's. Then they 
are clustered into two groups. One potential issue is that there may exist multiple locations for 
one single user if she/he makes multiple calls during time interval — et, ti + et]. A number 
of methods can be considered to assign one single location from multiple locations, such as the 
central position or the most common position. We use the most common position due to its 
ease of calculation and understanding. 


Crowd. In order to detect crowds lasting for a certain amount of time we need to consider 
shared characteristics between clusters detected in consecutive timestamps. 

Definition 3. Crowd: Given a GDR database DBcdr with individual trajectories, a lifetime 
threshold eu, a consecutive intersection threshold ed and a commitment probability threshold 
Cp, a crowd C is a sequence of consecutive cylindrical clusters {CCt^, CCt^^.^ ,■ ■ ■ , CCt^ } which 
satisfy the following constraints: 


• Movement. The number of total locations in one crowd is more than one. 

• Durability. The lifetime of C, C.lt, namely the number of consecutive clusters, is greater 
than eit, i.e., C.lt > eu where C.lt = n — m + 1. 

• Gommitment. At least e^ users appear in each cylindrical cluster with existence probability 

Cp. 
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Figure 2. Illustrative Examples of Cylindrical Cluster and Closed Crowd. 


The movement and durability characterizations specify the types of crowd we are interested in. 
The commitment instead characterizes the fact that a certain subset of users needs to participate 
to all clusters. Again, due to the spatio-temporal sparsity of the CDR data, the computation 
of the commitment of an user requires some further considerations. Therefore, we propose the 
concept of existence probability, which is designed to overcome CDR sparsity. Indeed, as an 
individual is not constantly making calls, consecutive timestamps could not see all users in the 
cluster making calls. 

We design the existence probability of one user locating in a cluster at timestamp t as the 
proportion of the number of users in CCt to the number of users in CCt-i- The intuition for the 
definition of existence probability is that the user has conformity to follow others in the group 
that she or he was assigned to [23]. For example, the existence probability of userS in Figure 


2(a) at time t 2 is 1/3. In timestamp ti, userl, user2, and user3 stay in cluster CCt^, and one 


of them, user user2, goes to cluster CCt 2 at timestamp ^ 2 - userl and userS do not make calls 
in timestamp t 2 , which results in the uncertainty of their locations. Thus, we assign them the 
probability to stay with user2, which is in cluster CCt 2 - Furthermore, we make the existence 
probability decay over time, i.e., if a user does not appear in consecutive timestamps, such as 
user userA in timestamp and ^ 4 . Her existence probabilities in Figure 2(a) are [0,1, ^ x |] 

at each timestamp, respectively. 

Considering that a crowd is a sequence of clusters, we use the standard terminology of 
sequential pattern mining and affirm that: a crowd C is called a closed crowd if it has no super 
crowds, which means there does not exist super sequences containing C. 


Unusual Crowd. Usually, people have their own mobility trajectories in daily lives, such as 
going from home to work place everyday. When people go to attend a concert or a protest, their 
trajectories differ from their usual ones. The definition of crowd given above includes both usual 
daily trajectories (e.g., commuting) as well as unusual event trajectories (e.g., protests). This 
is, for instance, what the method in |32| aims to do. As we will show in the experiments section, 
such method generates an enormous amount of events, as opposed to what a city would need 
in order to identify specific unusual events. Here we define the concept of mobility profile to 
capture people’s normal movement behaviors, by comparing with which we can detect abnormal 
mobility behaviors. 

Definition 5. Mobility Profile: Given a CDR database DBcdr with individual trajectories. 
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one’s mobility profile is the groups of locations she/he visited for each time unit (hour) in every 
day. Notice that a location here corresponds to an antenna. 

Definition 6. Unusual Crowd: Given the mobility profiles of users, a similarity threshold 
Esi, a closed crowd C is said to be an unusual-crowd UC the average similarity between the 
trajectory of each user in the crowd and her/his mobility profile in corresponding time intervals 
is less than Esi- 

Unusual Event Detection. Due to the inaccuracy of CDR data and to the introduction of 
the existence probability concept, it is possible that two or more crowds share users and thus 
they represent the same event. Moreover, it is possible that many crowds might correspond to 
the same large event (e.g., two parades converging to the same square). To group together these 
unusual crowds, we define the concept of unusual event: 

Definition 7. Unusual Event: Given two unusual crowds UCi and UCj, UCi and UCj are 
connected into one unusual event if they satisfy the following principles: 

• Overlapping: The ending time Cj.fend of crowd Ci is temporally close to the beginning 
time Cj.tbegin of other crowd Cj, W.r.t. Cj.tbegin < Ci-tend- 

• Sharing: The number of common users, \Ci{^Cj\, is larger than or equal to half of the 
total users \Ci\fiCj\. 

An unusual event is a set of unusual crowds E = {UCi,UC 2 -, - ■ ■ ,UCn] in which any two 
unusual crowds are connected to each other by a path. Here one separate unusual-crowd is also 
an unusual event, if it does not connect with others. Based on the discussed concepts above, we 
formalize the unusual event detection problem as follows. 

Problem 1. Unusual Event Detection: Given all detected crowds during the interval of 
two timestamps, the goal of unusual event detection is to extract all unusual events happening 
in the time interval. 

Unusual crowd event detection in mobile phone GDR data faces several unique challenges. 
First, the sparseness of GDR data comes from not only the fact that a user’s location is recorded 
only when a call is made but also the way that this location is approximated as the cover area 
of an antenna that is being used by this call. To solve the temporal and spatial sparseness of 
GDR data, we propose to define user existence probability that can overcome the fact that a 
user’s location is recorded only when a call is made, and also to leverage the idea of cylindrical 
clnster to address the coarseness of user locations as they are recorded as the cover area of 
involved antenna. Moreover, the problem is targeted at inferring unusual events rather than 
people daily routines. To achieve so, we propose the concept of mobility profile to distingnish 
unusual crowding behavior from daily movements. 

3 Unusual Event Detection Framework 

Given the formal dehnitions above, we describe now an innovative and efficient framework 
to detect unusual crowd events from GDR data. Our framework is composed of four parts: 
cylindrical cluster detection, closed crowd detection, unusual crowd detection, and unusual event 
detection. 
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Cylindrical Cluster Detection. Given the database of the individual calls with the re¬ 
spective time and antenna information, a duration threshold et, and a scale threshold e^, the 
Cylindrical Cluster Detection algorithm maintains at each timestamp t the set of users observed 
from each antenna o, in the time interval [t - etit + et]. Then, for each timestamp it returns all 
the set of users whose size is larger than e^- All the detected cylindrical clusters are stored in 
ClusterDB. 

Closed Crowd Detection. The input for crowd detection is a set of cylindrical clusters 
ClusterDB extracted at each timestamp. There are three constraint thresholds considered in 
our crowd definition: movement, durability, commitment. Explicitly, if the subcrowd of one 
crowd meets the durability and movement constraints, it will satisfy the commitment constraint 
also. Thus the crowd definition satisfies the requirement of downward closure property, and then 
it is unnecessary to output all crowds, including the subcrowds of closed crowds. To avoid the 
redundancy resulted from outputting subcrowds, we can follow the Lemma below to decide if 
a crowd is closed or not. Lemma. A crowd C with clusters {CCt+m-, CCt+m+iv " C'C't+n} 
is a closed crowd, if there does not exist CCt+m-i or CCt+n+i that can be added to crowd 
C such that a new crowd is formed. The restriction of closed crowd contains two conditions, 
one is that no suffixed cluster can be appended into it and the other is that no prefixed cluster 
can be merged in its front. To discover closed crowds in cluster database at current timestamp 
t, the first condition is easy to check: if there exist clusters in next timestamp t + 1 that can 
be appended to current crowd C, then the process will continue; if not, we only need to verify 
whether current crowd C is the subcrowd of crowds formed at current timestamp t. It is not 
necessary to check every crowd at previous timestamps because that current crowd at timestamp 
t can only be the subcrowd of crowds ending at timestamp t. 

Figure [2(b)| shows an illustrative example for this process. Suppose that crowds C\ and C 2 
are found as closed crowds, if there is no cluster at timestamp tg that can be appended to crowd 
6 * 3 , then we need to further check whether it is the subcrowd of previous crowds. It is obvious 
that it is impossible for 6*3 to be the subcrowd of crowds ending at t 4 or earlier timestamps, 
such as Cl, but it is possible to be the subcrowd of crowds ended at fg, such as C 2 . 

To find all closed crowds in ClusterDB, we start with iterating each timestamp in an 
increasing order. At each timestamp t, we check whether each candidate crowd at timestamp 
t — 1 can be appended by clusters at timestamp t. If the candidate crowd satisfies the movement 
and durability constraints, and at the same time it is not the subcrowd of crowds ending at 
timestamp t — 1, then we can output the current candidate as a closed crowd. The current 
candidate crowd can then be appended by one more cluster to form a new candidate crowd at 
t. The candidate crowd set contains all crowds which can be appended by a new cluster at t. 
Then we put all clusters at timestamp t to it to form a new candidate crowd set at t. This order 
of adding candidate crowd to candidate set guarantees that we only need to check whether the 
potential crowd is the subcrowd of closed crowds ending at the same timestamp. 

Complexity: The extraction of closed crowds is similar to the extraction of closed frequent 
sequential patterns whose complexity in the worst case can be approximated with * |T|) 

where |A| is the number of antennas (i.e. clusters) and |r| is the number of timestamps. 

Unusual Crowd Detection. With the detected closed crowds, we further verify whether 
their users present unusual or regular behaviors. As introduced in Section [ 2 l we use mobility 
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profile to decide whether users’ movement trajectories are unusual. 

To generate the mobility profiles, we scan the historic CDR data once to record the specihc 
locations a user visited at each timestamp during every time period. For example, ttserd’s 
existence probability vector in corresponding crowd is Wc = [0,1, 5 x |] in Figure 2(a) His 


profile vector is extracted from his mobility profile at corresponding timestamps (from ti to 
ti), i.e. Wm = [§, 2^5 4^5 2 ^]- There are several ways to define the similarity between user’s 
mobility profile and his trajectory in the crowd. We use cosine similarity to calculate the 
similarity score, because of its ease of understanding and implementation. The cosine similarity 
between two vectors Wc and Wm is dehned as: CosSim{wc,'Wm) = 


The Unusual Crowd Detection algorithm first calculates for each user in the crowd the 
similarity between her trajectory and her own mobility profile. Then the similarities obtained 
are averaged, and the obtained value is greater than the similarity parameter egi, it is an unusual 
crowd. 

Complexity: The mobility profile construction requires a scan of the dataset, therefore its 
complexity is 0{DBcdr)- The detection of Unusual-Crowds requires for each crowd the com¬ 
putation of the cosine similarity for all the users being part of a crowd, thus its complexity is 
0(|C| * |U|) where \C\ is the number of crowds and \V\ the number of users. 


Unusual Event Detection. With discovered unusual crowds, we finally detect their rela¬ 
tionships and connect them into one event if they meet the requirements of Dehnition 7. In 
this step, we use graph theory to find and generate unusual events. First if two unusual crowds 
satisfy both overlapping and sharing principles, we create an edge to connect them. With this 
generated graph, where each node is one unusual crowd and an edge indicates that two crowds 
belong to the same event, the event detection is to generate all components in the graph. Note 
that this graph may not only be disjoint but also include single nodes. Each component or sin¬ 
gle node is an unusual-event that is our final goal of this work. The first part of this algorithm 
checks if two unusual crowds can be connected to each other by parameters overlapping and 
sharing. The second part generates all the components in the unusual crowds graph, where 
any graph algorithm can be used. The detected event contains the users in each cluster and its 
corresponding timestamp and location. 

Complexity: The detection of Unusual-Events requires a pair-wise comparison between all 
the Unusual Crowds, therefore the complexity of this procedure is 0{\UC\'^) where \UC\ is the 
number of Unusual Crowds. 


4 Experiments 

4.1 Experimental Setup 

CDR Data. The D4D Orange challenge made available data collected in Cote d’Ivoire over 
a five-month period, from December 2011 to April 2012. The datasets describe call activity of 
50,000 users chosen randomly in every 2 -week period. Specihcally, the data contains the cell 
phone tower and a timestamp at which the user sent or received a text message or a call in 
the form of tuple <UserID, Day, Time, Antenna>. Each antenna is associated with location 
information. To avoid privacy issues, the data has been anonymized by D4D data provider. 
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Figure 3. Time series of unusual events, gatherings, users, and calls in the first two-week. 


From the CDR data, we find that about 63% users do not make calls in consecutive hours 
and 19% users make calls in only two consecutive hours. The pattern demonstrates the necessity 
of existence probability for user’s location estimation, as most of users do not make regular and 
consecutive calls at each timestamp. We also observe that the probability that there is one hour 
between one user’s two calls is more than 75% and that is 8% for two-hour interval. In total, 
there are more than 80% two consecutive calls whose intervals are at most two hours. These 
observations demonstrate the challenges of spatio-temporal sparseness on CDR data, which 
makes the design for degenerative existence probability reasonable for the coarse grained CDR 
data. 

Comparison Methods. To the best of our knowledge, this is the first work to detect unusual 
crowds and events in spatio-temporal data, and it is also the first time that we discover moving 
clusters in CDR data. We compare the results of our approach with the proposed methods 
described in |32] (GAT) and in |17j (MOV), as the methods employed in these work are also 
able to identify moving crowds. However, those methods have not been designed to work on 
CDR, but have to be adapted to perform the comparison. GAT defines a method to detect the 
gatherings in a trajectory dataset. A gathering is a sequence of spatial clusters with a certain 
number of committed users being member of an enough number of clusters. We use the same 
setting with GAT for parameters that indicate the same physical meanings in both methods. 
Clearly by following our intuition and goal of problem design, there should not exist any crowd or 
event at most days. Based on the results of parameter analysis in Section [T3] and the developed 
Visual Analytics System, we selected the following parameters 6^=20, eit=4, eci=10, ep=0.2, 
and €si=0.2. Since they correspond to a probability to find unusual-crowds in an hour to be 
around 10-15%, which helps us focus on rare events (as opposed to business as usual events). 
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Table 1. Comparison of the unusual event detection (UE) and gathering detection (GAT) |32j . 


Period 

Date 

Event Name |22| 

UE 

#UE 

GAT 

#GAT 


Dec. 07 

Anniversary of Felix Death 

c 


V 


Dec. 05 - Dec. 18 

Doc. 11 

Parliament election 

V 

20 

X 

287 


Dec. 17 

Violence 

V 


v 



Dec. 25 

Christmas day 

X 


X 


Dec. 19 - Jan. 01 

Dec. 31 

New year eve 

v 

36 

V 

56 


Jan. 1 

New year day 

V 


X 


Jan. 02 - Jan. 16 

Jan. 08 

Baptism of Lord Jesus 

V 

31 

V 

176 

Jan. 14 

Arbeen Iman Hussain 

V 


X 


Jan. 17 - Jan. 29 

Jan. 17 

Visit of Hilary Clinton 

V 


c 


Jan. 18 

Visit of Kofi Annan 

V 


V 



Jan. 30 

ACNF 2012 vs Angola 

V 


c 



Feb. 04 

ACNF 2012 vs Equatorial Guinea 

V 


V 



Feb. 04 

Mawlid an Nabi Sunni 

V 


v 


Jan. 30 - Feb. 12 

Feb. 05 

Yam 

V 

58 

X 

310 


Feb. 08 

ACNF 2012 Semi Final VS. Mali 

X 


V 



Feb. 09 

Mawlid an Nabi Shia 

V 


V 



Feb. 12 

ACNF 2012 Final VS. Zambia 

V 


X 


Feb. 13 - Feb. 26 

Feb. 13 

Post African Cup of Nations Recovery 

V 

52 

V 


Feb. 22 

Ash Wednesday 

V 


V 


Feb. 27 - Mar. 10 

None 



26 


269 

Mar. 11 - Mar 25 

Mar. 12 

Election of National Assembly President 

c 

17 

V 

31 

Mar. 13 

Election of National Prime Minister 



V 


Mar. 26 - Apr 08 

Apr. 01-04 

Education International Congress 

c 

75 

c 

1990 

Apr. 06 

Good Friday 

V 


V 


Apr. 09 - Apr. 22 

Apr. 09 

Easter Monday 

c 

10 

c 

33 

Apr. 13-14 

Assine fashion days 

V 


V 


Total 


23/25 

340 

19/25 

3326 

Precision 


0.0676 

0.0057 

Recall 


0.9200 

0.7600 


4.2 Experimental Results 

Detected unusual events. Table [T] reports a series of events occurred in Abidjan in the 
different periods covered by the datasets. In order to perform a fair study of the effectiveness 
of our method in comparison with GAT, we selected a third part set of events reported in |22] . 
To limit the explosion in the number of detected gatherings, we have set the most restrictive 
values for the remaining parameters: d = 0.0, kp = 2, and nip = 5. Moreover, we report the 
total number of generated events by both methods for each two-week period and subsequently 
generate Precision and Recall scores for both algorithms. It is possible to notice that our 
method detects a lower number of unusual events w.r.t. GAT. This is reflected in a higher value 
of Precision. Although, our method reports a lower number of events, it is able to detect a 
greater number of ground truth events, and this corresponds to a higher value of Recall. Notice 
that the two measures represent an estimation of precision and recall since the ground truth 
is not given. Indeed the list of events in [22] is not comprehensive of all events that happened 
in Ivory Goast in the monitored 5-month period, and this explains the low precision of both 
methods. This is the reason why we did not try to find the optimal values of the parameters to 
maximize Precision and Recall, but instead set such values based on the general criteria to find 
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unusual-crowds only in around 10-15% of the hours. However, the list in Table [T] gives a good 
basis for comparison and shows that our method is 10 times more precise than GAT. 

We perform another comparison with MOV, where the authors introduce the concept of 
moving clusters. Extracting moving clusters is equivalent to run our method with the parameter 
€p (the probability of a user to be committed) and Cgi (the similarity threshold between the 
mobility profile and the trajectories in the event) to 1. With these parameter settings the 
algorithm was not able to find any moving clusters. This is due to the fact that our method 
is able to handle the spatial and temporal sparsity of CDR data, while the MOV method is 
designed to work with GPS trajectories. 


Time-series. We report the time series of the numbers of unusual events and gatherings 
detected with different input parameter settings by using our method and the GAT algorithm 
in Figures 3(a) and|3(b)1 We try to match the same parameters we used in our method. For the 


minimum lifetime as well as the minimum number of objects that should belong to a cluster, 
we choose the same values adopted in the study of the effectiveness of our method {en = 4 
and En = 20). The rest of the input parameters of the algorithm to detect gatherings are the 
following: d is the minimum distance necessary to connect clusters detected in two consecutive 
time snapshots; kp is the minimum number of time snapshots required to consider an user as a 
participant; nip is the minimum number of participants to create a gathering. For these input 
parameters, we tried different enumerations to span the full admissible ranges. As it is possible 
to see, the number of detected gatherings is very high even if the parameters are chosen to be 
very restrictive. All the graphs show a daily trend, demonstrating that this method is not able 
to find unusual events as we propose in this paper. Indeed, we can detect a large number of 
gatherings every day, that might not correspond to specific unusual events. 

To further evaluate our discoveries, we check the total communication volumes and the 
specific antenna activities. We can clearly see that between Dec. 06 and Dec. 18, in Figure 


3(c) there exist periodic patterns on each day without obvious peak values corresponding to the 
discovery of crowds— anniversary of Felix death on Dec. 07 and Parliament election on 
Dec. 11. Furthermore, the events on the day of parliament election involved five antennas. Their 
communication activities are plotted in Figure |3(d)[ Obviously there do not exist correlations 
between corresponding antenna activities and our unusual crowd/event output. These two 
regular and stable time series of communication activities further confirm the effectiveness of 
our problem design. These examples show that detecting unusual events is a complex task, 
which cannot be easily accomplished by looking at outliers in call time series. Thus, methods 
like [Smi] are not directly applicable. 

Spatial distributions. Another comparison performed against GAT regards the spatial 
distribution of the detected events/gatherings. For both methods, we select the resnlts obtained 
on December 11th. For the GAT method we select the results with lowest number of detected 
gatherings. In Figures 4(a) and |4(b)| we report the detected events and detected gatherings 
respectively. Notice that a Voronoi tessellation has been applied in order to associate a covering 
area to each antenna. Our method detect 2 events, Event 1 (left) covers 3 antennas and it 
lasts for 4 hours. Event 2 (right) covers 2 antennas and also this one lasts for 4 hours. For 
the same day, the algorithm of Zheng et al. detects many gatherings (25) occurring in different 
places of the city. This is probably due to the fact that the typical mobility profiles of the users 
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(a) Detected unusual events 


(b) Detected gatherings 


Figure 4. The unusual events (a) by our methods and the gatherings (b) by GAT detected 
on December 11th. Colors range from green to red as function of the number of detected 
participants 


are not taken into account in the process and recurring events and unusual are both detected. 
Moreover, if we consider the lifetime of gatherings occurring in the same locations of our events, 
we notice that it is generally longer. For example, a gathering, covering the same antennas of 
Event 1, lasts for 14 hours. Another characteristic of the gatherings is that they happen in 
the same location at different times. Instead, in our model, we define a method to consider 
those as one large event. In summary, with our method it is possible to identify events that 
occur occasionally in a precise zone of the city and happen in a precise period of time, while the 
other method detects several events without any distinctions between the periodical ones and 
the unusual ones. 

4.3 Efficiency and Parameters 

Our algorithms are implemented in Python 2.7.5, and all experiments were performed on a 
laptop running Windows 7 with Intel(R) Core(TM) i7-2720QM CPU@2.20GHz (2 cores) and 
8GB memory. All related experiments are running on the first two-week dataset, which contains 
about two million GDR historic data. We simulate each experiment with specific parameter 
setting for 100 times to get both the average running time and standard deviation. In general, 
the algorithms for our specific problems are efficient, in the fact that it only takes about 30 
seconds to two minutes on two million GDR data. Furthermore, the execution of our methods 
is stable among the different runs. 

We evaluate the influence of parameter setting on the number of detected unusual crowds 
and discuss the guidelines for determining parameter settings. We find that the algorithm is 
particularly sensitive to eu, €p, and e^j. eit is indicative of the duration of moving crowds. Based 
on the definition of commitment, a larger Cp can produce more compact crowds, which have a 
much higher probability to be unusual events. Finally, the lower similarity Cgi threshold between 
regular mobility profiles and specific trajectory we set, the more crowds will consist of people 
whose mobility behavior differs from their typical profiles. 

We would like to point out that there is not an unique way to optimally select the values 
of the parameters, as this strongly depends on the end-user application. For instance, if a city 
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Figure 5. Real-time view of city map and statistics. 


manager is interested in monitoring visitors to a museums, she might set different values of the 
eit and Cp, compared to the monitoring on a protest. We develop a visual analytics tool described 
in Section [5] to help end users explore and test the detected results under different parameter 
settings for different applications. 

5 Visual Analytics Prototype System 

We have developed a visual analytics system to support the exploration of the unusual crowd 
events based on the proposed framework. The system allows end users—such as analysts and 
city managers—to analyze the formation and evolution of crowds, and study the impact of 
different parameters on the obtained results, heuristically suggesting possible changes to get 
more meaningful results depending on the desired application. The interface consists mainly 
of two components: the map overview of the observed city (Figure [5] (a)) and the statistics of 
users, crowds, and events (Figured (b)). 

Map view. In the map, the system visualizes the latest clusters, crowds and unusual events 
detected in the form of polygons as shown in Figured] (a). The polygons are the convex hulls of 
the location updates of the users belonging to one of the aforementioned groups. The clusters 
detected by the system at a given timestamp are visualized on the map as green polygons. On 
mouse-over, the UI shows a pop-up window with the cluster attributes, including 1) timestamp— 
the timestamp when the cluster was detected, 2) ^(^^users—the number of users that are a part of 
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Figure 6. Analyst statistics view 


the cluster, 3) area—the area covered by the Cluster polygon in square kilometers, 4) density— 
the ratio between the number of users in the Cluster and its area, and 5) POIs—the list of Points 
Of Interest located within the cluster area. The detected Crowds and the Unusual events are 
visualized on the map as blue and red polygons, respectively. Similar to the cluster visualization, 
a set of properties can be shown in a pop-up window. 

Statistics view. We envision the tool to be used by two different types of actors: the Analyst, 
which is in charge of setting up the analytics system, and the city manager, who has to take 
actions based on the identihed unusual crowd events. The City manager tab contains a single 
time-series graph representing the number of Clusters, Crowds, Unusual Events detected at 
every timestamp as displayed in Figure [5] (b). This tab contains the most crucial outcome of 
the analytics performed and it provides an intuitive way to represent the most recent mobility 
patterns of the city. The Analyst tab contains a richer set of statistics in Figure [6l In order 
to make efficient use of available space we fit the graphs into collapsible panels into groups of 
semantically relevant statistics, including 1) Cumulative—the cumulative trends of the detected 
clusters, crowds and unusual events, 2) Detection per minute—the time-series of the number of 
clusters, candidate crowds, crowds, and unusual events, 3) Event monitoring—the time-series of 
maximum and minimum value of lifetime, number of committed users, total number of users, 
and similarity of the candidate crowds, 4) Cluster monitoring—the time-series of the maximum 
size of the detected clusters, and their minimum spatial radius. In addition, a red dashed-line 
corresponding to each parameter is shown to depict parameter efficacy (e.g. event monitoring, 
cluster monitoring). This allows the Analyst to understand the role of each parameter on the 
obtained results and to set the most appropriate parameters for the specific application in scope. 
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6 Related Work 

The availability of mobility data has offered researchers the opportunity to analyze both in¬ 
dividual’s and group’s moving behaviors. In [laiii], the authors dehned a methodology to 
extract dense areas in spatio-temporal databases, thus identifying where and when dense areas 
of mobile objects appear. A similar definition was proposed in m, where the authors intro¬ 
duced the concept of moving clusters. Following these ideas, several group and cluster mobility 
pattern models have been proposed (UHnillHET] . For instance, in [2l[TH[28l[29] the concept of 
flock is widely investigated. A flock is a group of objects that travel together within a disc of 
some user-specified size for at least k consecutive timestamps. The main limit of this model is 
that a simple circular shape does not reflect natural grouping in reality. A marginal different 
concept convoy was introduced in [EllIB], where a density-based clustering is adopted instead 
of the radius of the disk. Li at al. [18] proposed a more general type of trajectory patterns: 
swarm. The swarm is a cluster of moving objects that lasts for at least k timestamps, possibly 
non-consecutive. Another group mobility model was introduced in [32], called gathering, whose 
novelty regards the introduction of the concept of commitment. 

Other interesting works dealing with the detection of anomalies in city traffic flow are pre¬ 
sented in [611211125] . In [21], the authors use likelihood ratio test statistic (LRT) on GPS trajecto¬ 
ries of taxis to detect traffic flow anomalies. Transportation model detection problem is studied 
in mobile phone data and GIS data [23]. A passive route sensing framework is introduced to 
monitor users’ significant driving routes with low-power sensors in mobile phones [20] . However, 
these works do not address the problem detecting unusual events considering people mobility 
but are more focused on traffic flow analysis through an aggregation of the information. On the 
contrary, in this paper we are interested in detecting such events that involve a large number of 
people whose current mobility differs from their typical one. 

All the above works are designed and tested on high-resolution trajectory data, such as the 
one provided by GPS systems. Low-resolution location data collected from telecommunication 
operators, on the other hand, is much more pervasive resulting in a much larger sample of the 
population being monitored, see [ll[5l[9l[T3]. In this paper, we propose a new method to mine 
coarse grain mobile phone data (in the form of GDR) to detect unusual crowd events. Indeed, 
the aim of our work is to detect events that involve a large number of people performing unusual 
activities. To do so, we compute a similarity between the mobility profile of the users and their 
trajectories in group pattern. This extends, thus, the concept of commitment since users need 
to be committed and have trajectories that differ from the ones in their mobility profiles. Our 
method however is able to identify moving events that span several locations over time, and 
involve a subset of committed users, something that could not be detected by using the methods 
in [261130]. 


7 Conclusion and Future Work 

In this paper, we formally define the problem of inferring unusual crowd events from mobility 
data. Previous work on event detection is limited on inferring the usual event from the fine¬ 
grained GPS data. Our problem definition differs by characterizing the unusual crowd events 
and presenting a new methodology to extract them from coarse-grained GDR data. The main 
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contributions of this paper w.r.t. existing methods are the ability to analyze temporally and 
spatially sparse data as CDRs and the definition of a subclass of events which are unusual to its 
attendees. Our experimental results demonstrate the effectiveness of our method in a real-world 
mobile data. 

Despite the promising results of the present work, there is still much room left for future work. 
First, while this proposed method relies on Visual Analytics to help end users set parameters, we 
are planning to design algorithms to determine parameters for specific applications of interest 
as well as an optimization procedure for evaluation metrics. Moreover, we are working toward 
combining mobile and social media data together to detect unusual events. In doing so, we 
have the potential to detect and monitor crowding activities in real time, and eventually yield 
a better and smarter planet. 
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